Comparison of DNA extraction methods for 16S rRNA gene sequencing in the analysis of the human gut microbiome

The gut microbiome is widely analyzed using high-throughput sequencing, such as 16S rRNA gene amplicon sequencing and shotgun metagenomic sequencing (SMS). DNA extraction is known to have a large impact on the metagenomic analyses. The aim of this study was to compare DNA extraction protocols for 16S sequencing. In that context, four commonly used DNA extraction methods were compared for the analysis of the gut microbiota. Commercial versions were evaluated against modified protocols using a stool preprocessing device (SPD, bioMérieux) upstream DNA extraction. Stool samples from nine healthy volunteers and nine patients with a Clostridium difficile infection were extracted with all protocols and 16S sequenced. Protocols were ranked using wet- and dry-lab criteria, including quality controls of the extracted genomic DNA, alpha-diversity, accuracy using a mock community of known composition and repeatability across technical replicates. SPD improved overall efficiency of three of the four tested protocols compared with their commercial version, in terms of DNA extraction yield, sample alpha-diversity, and recovery of Gram-positive bacteria. The best overall performance was obtained for the S-DQ protocol, SPD combined with the DNeasy PowerLyser PowerSoil protocol from QIAGEN. Based on this evaluation, we strongly believe that the use of such stool preprocessing device improves both the standardization and the quality of the DNA extraction in the human gut microbiome studies.


Results
Study design. In our study, four commercial DNA extraction protocols were evaluated based on the supplier's recommendations: the NucleoSpin Soil kit (Macherey-Nagel, named MN), the DNeasy PowerLyzer Pow-erSoil kit (QIAGEN, named DQ), the QIAamp Fast DNA Stool kit (QIAGEN, named QQ), and the ZymoBIOM-ICS DNA Mini kit (ZymoResearch, named Z). In order to facilitate the first steps of DNA extraction, they were also tested with an upstream stool preprocessing device, named SPD (see Supplementary Methods for detailed protocols). The resulting protocols were named as follows: S-MN stands for SPD + MN, S-DQ for SPD + DQ, S-QQ for SPD + QQ and S-Z for SPD + Z.
We analyzed fecal samples from nine healthy volunteers (CDI−) and nine patients suffering from CDI (CDI+). A defined mixture of bacterial species (mock community) was also prepared and sequenced to assess the efficiency and accuracy of DNA extraction by comparing the observed bacterial abundances to the theoretical ones. DNA extraction protocols were compared using 16S rRNA gene amplicon sequencing for a total of 456 samples (18 fecal samples and 1 mock community, in triplicates) (Fig. 1).
Quality and quantity of extracted DNA. When selecting a DNA extraction protocol, sufficient genomic DNA of high quality is desirable for preparing metagenomic libraries. In the present study, we evaluated the DNA yield, DNA fragment size and DNA quality. A protocol that performs poorly on these criteria would likely skew measured bacterial compositions as only a small portion of bacterial communities present in the original sample would be analyzed. A summary of DNA extraction performance for all human fecal samples is presented in Supplementary Table 1.
Considerable variability was found in the extraction yield for the tested protocols (Fig. 2a), which is in line with previous studies 36 , and was not dependent on health status ( Supplementary Fig. 1a). Except for MN, DNA extraction protocols in combination with SPD seemed to recover as much or more DNA compared to their commercial versions. Notably, increases were observed for S-QQ (p-value < 0.1) and S-Z (p-value < 0.05), compared to QQ and Z respectively. A same DNA yield was obtained for the protocol DQ with and without the use of SPD (p-value > 1). SPD seemed to negatively affect the extraction yield when coupled with the protocol MN www.nature.com/scientificreports/ (p-value < 0.01). Out of the eight extraction protocols tested, protocols S-MN and Z significantly recovered the lowest DNA concentrations. In practice, a best performing protocol would be a protocol for which the highest number of samples could be prepared for sequencing. Here, for a given protocol, we measured the percentage of samples whose DNA concentration was superior to 5 ng/µl, threshold corresponding to the minimal DNA concentration recommended to prepare 16S rRNA gene sequencing libraries (Table 1). In our hands, none of the tested protocols was able to retrieve, for all the samples, DNA with a concentration superior to this threshold. Except for S-MN, the best performances were observed when the protocols were combined with SPD. S-Z recovered enough DNA material for 88% of samples, followed by MN (86%), S-QQ (82%) and S-DQ (81%).
Regarding the fragment size of DNA, variations were also observed between the extraction protocols. QQ and MN protocols yielded the shortest DNA fragments with a median size around 12,000 bp, which was shorter than S-QQ (p-value > 0.1) and significantly shorter than the other ones (p-value < 0.01, Fig. 2b). The longest DNA fragment sizes were observed for S-MN, with an average size of 21,000 bp, followed by DQ, S-DQ and Z with DNA fragments around 18,000 bp (p-value > 0.1). DNA fragments were significantly higher in CDI positive patients when extracted with MN (p-value ≤ 0.05), S-QQ or DQ (p-value < 0.01), but other protocols showed similar DNA fragment size regardless of health status ( Supplementary Fig. 1b).
We also assessed DNA purity using the A260/280 ratio. A ratio of 1.8, which is generally accepted as "pure" for DNA, was observed for S-DQ (Fig. 2c). A ratio below 1.8 was observed for the protocols MN, S-MN, Z, S-Z and DQ, which may indicate the presence of protein, phenol or other contaminants. A ratio close to 2 was assessed for QQ and S-QQ suggesting the possible presence of RNA in samples (p-value < 0.01 in comparison with the other protocols). Except for MN, the protocols combined with SPD generated DNA of purity equal or superior to their standard versions. Besides QQ and Z, all protocols showed equivalent DNA purity between CDI+ and CDI− samples ( Supplementary Fig. 1c).   www.nature.com/scientificreports/ Observed microbial diversity and performance in extracting Gram-positive bacteria. In addition to the wet-lab criteria, the extraction quality was also evaluated, using 16S rRNA gene amplicon, by investigating the observed microbial diversity of samples (Fig. 3). This alpha-diversity has been recently described as a good indicator of DNA extraction performance, being positively correlated with the Gram-positive bacteria extraction 35 . No significant difference in microbial diversity was observed for CDI+ patients compared to healthy volunteers ( Supplementary Fig. 2). Previous studies have shown a significant decrease in microbial diversity in patients with recurrent CDI but not with initial CDI 69,70 . As a considerable variability was found within each group of individuals, we corrected the individual effect in the statistical model to emphasize differences between extraction protocols. The median alpha-diversity values were between 4.0 and 4.2 for all tested protocols (Fig. 3). Interestingly, the alpha-diversity was equal or the highest when samples were extracted with an SPD-associated protocol except for MN which performed better than S-MN according to 16S data (p-value ≤ 0.05) (Fig. 3). Preliminary SMS data also showed improved alpha-diversity with SPD-associated protocols compared to commercial protocols ( Supplementary Fig. 3a).
We then evaluated if the observed diversity was associated with an effective Gram-positive bacteria recovery. For this purpose, we assessed the ratio Firmicutes/Bacteroidetes, two main phyla commonly found in the gut microbiota. Firmicutes and Bacteroidetes are phyla of bacteria, which are, for the most part, Gram-positive and Gram-negative respectively. In theory, the ratio Firmicutes/Bacteroidetes should be improved by a protocol performing well for the extraction of Gram-positive bacteria 71 . Remarkably, this ratio was increased for the four protocols combined with SPD in comparison to their standard versions, in both 16S and SMS data ( Table 2  and Supplementary Table 2). To quantify more precisely the SPD effect on microbial community composition, DESeq2 was used to test the differential abundance of taxa between standard vs SPD-combined protocols. For each patient, the relative abundance of the Firmicutes phylum increased significantly, whereas the Bacteroidetes phylum decreased significantly with the use of SPD. This analysis was also performed at the family level (Supplementary Fig. 4), where SPD led to a significant decrease of Gram-negative families and a significant increase of Gram-positive families (Supplementary Table 4). Altogether, our results were consistent with a positive effect of SPD on the observed alpha-diversity by improving the recovery of Gram-positive bacteria.
Extraction protocol accuracy. In order to estimate the accuracy of the extraction protocols, a mock community consisting of nine bacterial species of known respective abundances was prepared and sequenced. The protocol accuracy was estimated by calculating the Aitchison distance (the lower the distance, the better the prediction) between observed and expected abundances at the genus level (Fig. 5). Interestingly, the bacterial abundances were better predicted using 16S rather than SMS ( Fig. 4 and Supplementary Fig. 3b). Independently of the metagenomics methods, these predictions were improved when SPD was used upstream for the protocols QQ and MN. Based on 16S rRNA gene data, DQ was the most accurate protocol, followed by S-MN, S-Z, Z and S-QQ. Detailed bacterial abundances at the genus level are plotted in Supplementary Fig. 4. As observed for human samples, SPD improved the recovery of Gram positive bacteria compared to standard protocols. Discrepancies between expected and observed abundances seem mostly related to GC content 72 . Considering both approaches, bacterial families with high GC content such as Pseudomonas tend to be overestimated whereas families with low GC content such as Listeria tend to be underestimated. However, this pattern is not as visible with SPD-associated protocols.
Protocol repeatability. The eight protocols were next evaluated for repeatability across the variations of bacterial abundances between triplicates of a same stool sample (Fig. 5). We observed an increase of the repeatability when the protocols were coupled with SPD compared to their standard versions except for QQ but this increase was not significant (p-value > 0.1). The median of the Aitchison distance was divided by 1.01 between QQ (14.99) and S-QQ (14.90), 1.08 between Z (13.44) and S-Z (12.40), 1.22 between MN (14.70) and S-MN (12.01) and 1.09 between DQ (15.30) and S-DQ (14.05). S-MN was the most repeatable protocol, closely followed by S-Z. www.nature.com/scientificreports/ Protocols overall performance. In our study, eight DNA extraction protocols were evaluated using both wet-and dry-lab criteria, with 16S rRNA sequencing read-outs. To help in data interpretation, we ranked the protocols according to a custom designed scoring system which was assigned to each criterion based on the observed 16S rRNA gene profiling results (Fig. 6). For each criterion, a score of 0 (the worst result obtained in our dataset), 1 or 2 (the best result obtained in our dataset) was given. These scores were then plotted using a spider chart: a score of 0 represents the center, whereas a score of 2 is the vertex. Protocols were given the same score if no significant difference was observed. The generated areas were then used to help in selecting the bestoverall performing DNA extraction protocol.
The protocols Z and QQ combined with SPD performed better compared to their standard version while performance of the MN protocol was diminished when combined with the SPD (Fig. 6). Protocols S-DQ and DQ showed equivalent overall performances for the represented criteria with S-DQ showing higher microbial diversity and DQ, improved accuracy. Considering SPD associated protocols, in our hands, S-DQ showed the best overall performance (Fig. 6a). Although other protocols showed similarly good results for some criteria, S-DQ was the only protocol being among the best performing protocols for all tested criteria. The S-DQ performance was slightly inferior to S-QQ and S-Z regarding DNA yield but this difference was not significant (p-value > 0.05, Fig. 2a). Even if S-DQ was not the best protocol for this criterion, enough DNA material was produced for more www.nature.com/scientificreports/   www.nature.com/scientificreports/ than 80% of samples to prepare and sequence the metagenomics libraries. S-DQ was also found to be less repeatable than S-Z and S-MN but the slight differences were not significant (Fig. 5).
Considering the standard versions of the protocols, DQ had the best overall performance (Fig. 6b). This protocol performed well in terms of accuracy and extracted DNA yield and quality. MN performed significantly better than DQ for microbial diversity (p-value < 0.01, Fig. 3), but performed poorly on other criteria. Finally, MN, QQ and Z were slightly more repeatable than DQ, but not significantly (p-value > 0.1, Fig. 6).

Discussion
DNA extraction is a crucial step of the metagenomics workflow known to be influenced by many parameters, which are difficult to evaluate exhaustively. In addition to in-house protocols, new commercial solutions are now emerging, making difficult the choice of a good protocol for the gut microbiota. Benchmarking protocols is thus crucial to understand the potential biases and to avoid errors during data interpretation. Recent gut microbiome studies compared various DNA extraction protocols but were limited to a low number of fecal samples, mainly from healthy individuals [27][28][29][30]37,73,74 . As a consequence, the performance of such protocols may not be guaranteed for a clinical cohort.
Our study is the first, to our knowledge, to compare four commercial DNA extraction protocols using 16S rRNA amplicon sequencing method on an adequate number of stool samples for statistical analysis and biological conclusion (n = 18). In an effort to streamline fecal preparation prior to DNA extraction, the commercial protocols were also tested in combination with a stool preprocessing device. As recommended by recent studies, we also included a positive control, the mock community, so that we could reliably assess the accuracy of extraction protocols. The mock was made up of nine bacterial species and processed alongside fecal specimens. The eight protocols tested were ranked based on wet-and dry-lab criteria. The global aim was to identify one method that performs well and generates the most accurate and reproducible data.
In addition to healthy donors, patients suffering from a Clostridium difficile Infection were also recruited, allowing to test the protocols on samples with various microbial composition, consistency and biomass. CDI is a burning issue, as Clostridium difficile, a Gram-positive bacterium, is the leading cause for diseases from mild diarrhea to pseudomembranous colitis in hospitalized patients 75 . Fecal microbiota transplant (FMT) is emerging as a new option for recurrent CDI 76 . Identifying which bacteria are already present (recipient) and have been transferred (donor) is essential and requires the use of highly sensitive, robust and fast metagenomics techniques 70,77 .
In our study, a total of 456 and 56 samples were analyzed using 16S rRNA gene sequencing and SMS respectively, allowing to have an important dataset for comparison results. Even if, as expected, SMS is more sensitive in bacterial detection, our present findings indicate good agreement between the two sequencing methods. However, it is to be noted that only one replicate of the SMS experiment was performed and further validation is needed. Our data also show good agreement between the samples from the two groups of individuals. Interestingly, our results show that no single DNA extraction protocol performed best on all the criteria tested. However, differences were not all significant, and considering the strategy of selection described above, the standard DQ protocol and S-DQ appeared as the best-performing protocols among commercial and SPD-associated solutions for extracting DNA from human fecal samples. The DQ protocol with or without the SPD generated an amount of good quality DNA that was compatible with subsequent library preparations for all samples. Extracted DNA quantity was superior to 5 ng/μl for 81% and 77% of samples using S-DQ and DQ respectively. Regarding the dry-lab criteria, for 16S rRNA profiling, DQ showed improved accuracy whereas S-DQ combined the best results in terms of alpha-diversity, extraction of Gram-positive bacteria, repeatability and accuracy in bacterial detection. www.nature.com/scientificreports/ Remarkably, the bioinformatics analysis also shed light on the added value of the stool preprocessing device for some extraction protocols. In our study, the protocols in combination with SPD have in common the first steps of the procedure. This includes the shaking and the mechanical lysis with zirconia and silica beads 0.1 mm. In such combination, we observe an increase of the observed alpha-diversity. Our results are in good agreement with Costea et al. who showed that these parameters of the protocol were positively associated with the observed diversity, which is a good indicator of an efficient lysis 35 . Biased protocols are also known to cause overrepresentation of Gram-negative bacteria due to the inefficient lysis of Gram-positive bacteria. For the SPD-combined protocols, we observed an increase of the relative abundance of Gram-positive bacteria and a corresponding decrease in the relative abundance of Gram-negative bacteria, which led to an increase of the Firmicutes/Bacteroidetes ratio. The SPD can therefore provide more accurate characterization of the microbiota by reducing the ratio bias. In terms of repeatability, SPD also showed promising results. This device would be of particular interest to limit variations when several experimenters, and even different labs in case of multi-centric studies, perform DNA extraction. Other approaches such as the OMNIgene ® •GUT system (DNA Genotek) or RNAlater (Thermo Fisher) preservation tubes also exhibit higher DNA extraction yield compared to snap-frozen samples (Neuberger-Castillo et al., 2020), further highlighting the added value of sample preprocessing. Lastly, the use of our in-house mock community, composed of both Gram-positive and Gram-negative bacteria cells, made possible to benchmark the protocols in terms of bacterial abundance predictions. Our results demonstrate that SPD in combination with most of the tested protocols is more accurate in assessing the bacterial abundances than the protocols in their standard versions. Comparison of the performance of the SPD device used in this study with other sample preprocessing methods is required to establish a new standard method. Such device prior DNA extraction may add additional costs and extra time and labor to the DNA extraction reactions but, from our perspective, getting unbiased and comparable microbiome data across labs and countries is priceless.
In this study, we focused on sample preprocessing and commercial solutions for DNA extraction. However, several other steps such as sample homogenization and library preparation are also crucial for accurate microbial community profiling. We are also aware that all the protocols may not have been tested in optimal parameters. The commercial protocols were tested using the beads provided in the kit on a Retsch system for 5 min. In our hands, protocol Z was one of the worst performer according to wet-lab criteria. Today, Zymo Research recommends other bead-beating protocols than the one tested. As shown by Tourlousse et al., vigorous bead-beating regimes allows effective recovery of Gram-positive bacteria. Optimizing this step may, therefore, improve extraction performance of all methods 37,78 . In a similar way the DNeasy PowerSoil kit (Catalog No. 12888-100), a previous version of the DNeasy PowerLyzer PowerSoil kit (Catalog No. 12855-100), was compared to other commercial solutions including the NucleoSpin Soil kit by Yang et al. In their hands, the QIAGEN protocol showed a lesser performance than the other protocols unlike the most recent kit which performed best in our study. This highlights the difficulty to establish a gold-standard for gut microbiome analysis with the numerous, ever-evolving protocols. Moreover, great progress is been made in the field of automated nucleic acid extraction. Assessing performance of such systems would also be relevant in the scope of clinical studies.

Conclusion
We recommend the S-DQ protocol to extract microbial DNA from human stool samples. While we have only tested S-DQ on fecal samples, we suppose that it might also work well with other types of microbiota samples, although some modifications may be necessary.
In addition to the DNA extraction protocol, sample preprocessing appears to be a new way to improve the overall performance of most DNA extraction protocols. We propose to now include stool-preprocessing devices in new microbiome studies to streamline and standardize DNA extraction.

Methods
Ethics approval and consent to participate. Fecal samples used in this study corresponds to left-over samples collected for diagnostic purpose. Each patient was informed regarding collection, storage and use for research activities. As this study was out of the regulations related to clinical trials, non-opposition statement Table 3. Composition of the microbial mock community and culture conditions. www.nature.com/scientificreports/ was obtained from all subjects and was sufficient to process the fecal samples according to the French legal and medical ethical guidelines. Both collection and use of fecal samples for metagenomic analyses were authorized by the French Ministry of Higher Education, Research and Innovation (Declaration N°DC-2018-3240).

Stool samples. Fecal samples from nine healthy volunteers and nine patients with Clostridium difficile
infection (CDI) were provided by a certified testing laboratory in France and tested for Clostridium difficile toxins. Upon reception, each fecal sample was freshly aliquoted into 24 tubes (8 protocols × 3 replicates) and frozen at − 80 °C until extraction, the − 80 °C storage being known to maintain a stable microbial community for long-term period 79 .
Microbial mock community. The microbial mock community was prepared by mixing nine bacteria ( . These protocols were also tested in combination with a stool preprocessing device (SPD, #421061, bioMérieux 52 ). This device was designed to facilitate and standardize fecal sample preparation before nucleic acid extraction. It includes a spoon for a 200 mg calibrated sample and a vial containing a buffer for sample resuspension, glass beads for homogenization and two filters for retaining fecal debris. After 5 min handson-time, the filtrate is ready-to-use for downstream DNA extraction. Protocols of extraction methods as well as SPD are detailed in Supplementary Methods. DNA was extracted in triplicates from fecal samples and from the microbial community. A260/A280 ratio was assessed using the DropSense 96 system (Trinean). Genomic DNA size was assessed using the Genomic DNA ScreenTape (#5067-5364, Agilent) on the 2200 TapeStation system (Agilent). DNA concentrations were estimated using the QuantiFluor One dsDNA kit (#E4870, Promega) with the GloMax system (Promega).
16S rRNA gene library preparation and sequencing. 16S  www.nature.com/scientificreports/ called chimeric sequences. These sequences were removed using the DADA2 83 algorithm, which, in addition, joins paired end reads and produce Amplicon Sequence Variant tables (ASVs). The taxonomy assignment of ASVs was done using the "feature-classifier" plugin with SILVA classifier trained on V3 and V4 regions (cls-silvaV34 84,85 ).

Shotgun metagenomic profiling.
After quality control with FastQC (v0.11.9), reads were trimmed and filtered based on the sequence quality and length using fastp (v0.20.0) with the default parameters. Contamination with host DNA was discarded by mapping the filtered reads on the human reference genome version GRCh37 using BBMap (v38.90) 86 . Clean reads were annotated using the kraken2 software (v2.1.1) 87 against the Unified Human Gastrointestinal Genome (UHGG) catalog 88 .
Statistical analysis. All analyses were performed in R (version 3.3.1). The analysis of microbiome compositional data were done on centered log-ratio (CLR) transformed matrices using the clr function from the "compositions" R package. The repeatability was assessed by calculating a Aitchison distance between replicates of a condition for every patient. Alpha-diversity (Shannon indices) was calculated for each sample using the vegan package. The taxonomical analysis of the mock community samples was done by mapping their SMS and 16S data using bowtie2 (v.2.3.5.1) 89 on indexes created with the 9 expected species ( Table 3). The accuracy of the protocols was evaluated on those samples by calculating the Euclidean distance between expected and predicted abundances after CLR transformation using the "philentropy" R package. Differentially abundant bacteria between protocols with or without the SPD were identified using the DESeq2 package. For each criterion (except for alpha-diversity), the statistical significance of the differences between protocols was computed with a pairwise Wilcoxon rank test. For multiple comparisons, p-values were corrected by Benjamini Yakuteli correction and adjusted p-values below 0.05 were considered statistically significant. The alpha-diversity values varied greatly from one patient to another, so the patient effect was controlled in a linear model using the "limma" package, and statistics were computed with the empirical Bayes method.

Data availability
The datasets generated during the current study are available on the BioProject database (ID PRJNA648321), at the following link: http:// www. ncbi. nlm. nih. gov/ biopr oject/ 648321.